Learning from Structured Data with High Dimensional Structured Input and Output Domain
نویسنده
چکیده
Structured data is accumulated rapidly in many applications, e.g. Bioinformatics, Cheminformatics, social network analysis, natural language processing and text mining. Designing and analyzing algorithms for handling these large collections of structured data has received significant interests in data mining and machine learning communities, both in the input and output domain. However, it is nontrivial to adopt traditional machine learning algorithms, e.g. SVM, linear regression to structured data. For one thing, the structure information in the input domain and output domain is ignored if applying the normal algorithms to structured data. For another, the major challenge in learning from many high-dimensional structured data is that input/output domain can contain tens of thousands even larger number of features and labels. With the high dimensional structured input space and/or structured output space, learning a low dimensional and consistent structured predictive function is important for both robustness and interpretability of the model. In this dissertation, we will present a few machine learning models that learn from the data with structured input features and structured output tasks. For learning from the data with structured input features, I have developed structured sparse boosting for graph classification, structured joint sparse PCA for anomaly detection and localization. Besides learning from structured input, I also investigated the interplay between structured input and output under the context of multi-task learning. In particular, I designed a multi-task learning algorithms that performs structured feature selection & task relationship Inference. We will demonstrate the applications of these structured models on subgraph based graph classification, networked data stream anomaly detection/localization, multiple cancer type prediction, neuron activity prediction and social behavior prediction. Finally, through my intern work at IBM T.J. Watson Research, I will demonstrate how to leverage structural information from mobile data (e.g. call detail record and GPS data) to derive important places from people’s daily life for transit optimization and urban planning.
منابع مشابه
Domain Transfer Structured Output Learning
In this paper, we propose the problem of domain transfer structured output learning and the first solution to solve it. The problem is defined on two different data domains sharing the same input and output spaces, named as source domain and target domain. The outputs are structured, and for the data samples of the source domain, the corresponding outputs are available, while for most data samp...
متن کاملA Deep Learning Model for Structured Outputs with High-order Interaction
Many real-world applications are associated with structured data, where not only input but also output has interplay. However, typical classification and regression models often lack the ability of simultaneously exploring high-order interaction within input and that within output. In this paper, we present a deep learning model aiming to generate a powerful nonlinear functional mapping from st...
متن کاملThe Impact of Learning Styles on the Iranian EFL Learners' Input Processing
This research study explored the impact of learning styles and input modalities on the second language (L2) learners' input processing (IP). This study also sought to appraise the usefulness of Processing Instruction (PI) and its components in relation to the learners' learning styles and input modalities. To this end, 73 male and female Iranian EFL learners from Islamic Azad University, North ...
متن کاملThe Impact of Structured Input-based Tasks on L2 Learners’ Grammar Learning
Abstract Task-based language teaching has received increased attention in second language research. However, the combination of structured input-based approach and task-based language teaching has not been examined in relation to L2 grammar learning. To address this gap, the present study investigated how the structured input-based tasks with and without explicit information impacted learners’ ...
متن کاملThe Impact of Structured Input-based Tasks on L2 Learners’ Grammar Learning
Abstract Task-based language teaching has received increased attention in second language research. However, the combination of structured input-based approach and task-based language teaching has not been examined in relation to L2 grammar learning. To address this gap, the present study investigated how the structured input-based tasks with and without explicit information impacted learners’ ...
متن کامل